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ABSTRACT 

In this work, we initiate the investigation of optimization 
opportunities in collaborative crowdsourcing. Many pop¬ 
ular applications, such as collaborative document editing, 
sentence translation, or citizen science resort to this spe¬ 
cial form of human-based computing, where, crowd workers 
with appropriate skills and expertise are required to form 
groups to solve complex tasks. Central to any collabora¬ 
tive crowdsourcing process is the aspect of successful col¬ 
laboration among the workers, which, for the first time, is 
formalized and then optimized in this work. Our formal¬ 
ism considers two main collaboration-related human factors, 
affinity and upper critical mass, appropriately adapted from 
organizational science and social theories. Our contribu¬ 
tions are (a) proposing a comprehensive model for collabo¬ 
rative crowdsourcing optimization, (b) rigorous theoretical 
analyses to understand the hardness of the proposed prob¬ 
lems, (c) an array of efficient exact and approximation al¬ 
gorithms with provable theoretical guarantees. Finally, we 
present a detailed set of experimental results stemming from 
two real-world collaborative crowdsourcing application us¬ 
ing Amazon Mechanical Turk, as well as conduct synthetic 
data analyses on scalability and qualitative aspects of our 
proposed algorithms. Our experimental results successfully 
demonstrate the efficacy of our proposed solutions. 


1. INTRODUCTION 

The synergistic effect of collaboration in group based ac¬ 
tivities is widely accepted in socio-psychological research 
and traditional team based activities [l9j[T8j|4]. The very 
fact that the collective yield of a group is higher than the sum 
of the contributions of the individuals is often described as 
“the whole is greater than the sum of its parts” 19, jTs] . 
Despite its immense potential, the transformative effect of 
“collaboration” remains largely unexplored in crowdsourc¬ 
ing [29] complex tasks (such as document editing, product 
design, sentence translation, citizen science), which are ac¬ 
knowledged as some of the most promising areas of next 
generation crowdsourcing. In this work, we investigate the 
optimization aspects of this specific form of human-based 
computation that involves people working in groups to solve 
complex problems that require collaboration and a variety 
of skills. We believe our work is also the first to formalize 
optimization in collaborative crowdsourcing. 

The optimization goals of collaborative crowdsourcing are 
akin to that of its traditional micro-task based counter¬ 
parts [l6 21 - quickly maximize the quality of the com¬ 
pleted tasks, while minimizing cost, by assigning appropriate 


tasks to appropriate workers. However, the “plurality opti¬ 
mization” based solutions, typically designed for the micro¬ 
task based crowdsourcing are inadequate to optimize col¬ 
laborative tasks, as the latter requires workers with certain 
skills to work in groups and “build” on each other’s contribu¬ 
tions for tasks that do not typically have “binary” answers. 
Prior work in collaborative crowdsourcing has proposed the 
importance of human factors to characterize workers, such 
workers’ skills and wages [42 [43 


Additional human 


factors, such as worker-worker affinity 47, 30 , is also ac¬ 
knowledged to quantify workers collaboration effectiveness. 
Similarly, social theories widely underscore the importance 


of upper critical mass 
a constraint on the size 


w [27] 
ize of £ 


for group collaboration, which is 
: groups beyond which the collabo¬ 
ration effectiveness diminishes [27] 39 . However, no further 
attempts have been made to formalize these variety of hu¬ 
man factors in a principled manner to optimize the outcome 
of a collaborative crowdsourcing environment. 

Our first significant contribution lies in appropri¬ 
ately incorporating the interplay of these variety of 
complex human factors into a set of well-formulated 
optimization problems. To achieve the aforementioned 
optimization goals, it is therefore essential to form, for each 
task, a group of workers who collectively hold skills required 
for the task, collectively cost less than the task’s budget, 
and collaborate effectively. Using the notions of affinity and 
upper critical mass, we formalize the flat model of work 
coordination [26] in collaborative crowdsourcing as a graph 
with nodes representing workers and edges labeled with pair¬ 
wise affinities. A group of workers is a clique in the graph 
whose size does not surpass the critical mass imposed by a 
task. A large clique (group) may further be partitioned into 
subgroups (each is a clique of smaller size satisfying critical 
mass) to complete a task because of the task’s magnitude. 
Each clique has an intra and an inter-affinity to measure re¬ 
spectively the level of cohesion that the clique has internally 
and with other cliques. A clique with high intra-affinity im¬ 
plies that its members collaborate well with one another. 
Two cliques with a high inter-affinity between them implies 
that these two groups of workers work well together. Our 
optimization problem reduces to finding a clique that max¬ 
imizes intra-affinity, satisfies the skill threshold across mul¬ 
tiple domains, satisfies the cost limit, and maximizes inter¬ 
affinity when partitioned into smaller cliques. We note that 
no existing work on team formation i n social networks p] [33] 
or collaborative crowdsourcing 29, 47 30] has attempted 
similar formulations . 

Our second endeavor is computational. We show 








that solving the complex optimization problem explained 
above is prohibitively expensive and incurs very high ma¬ 
chine latency. Such high latency is unacceptable for a real¬ 
time crowdsourcing platform. Therefore, we propose an 
alternative strategy Grp&Splt that decomposes the overall 
problem into two stages and is a natural alternative to our 
original problem formulation. Even though this staged for¬ 
mulation is also computationally intractable in the worst 
case, it allows us to design instance optimal exact algorithms 
that work well in the average case, as well as efficient ap¬ 
proximation algorithms with provable bounds. In stage-1 
(referred to as Grp), we first form a single group of work¬ 
ers by maximizing intra-affinity, while satisfying the skill and 
cost thresholds. In stage-2 (referred to as Spit), we de¬ 
compose this large group into smaller subgroups, such that 
each satisfies the group size constraint (imposed by critical 
mass) and the inter-affinity across sub-groups is maximized. 
Despite being NP-hard [13] , we propose an instance optimal 
exact algorithm Opt Grp and a novel 2-approximation algo¬ 
rithm ApprxGrp for the stage-1 problem. Similarly, we prove 
the NP-hardness and propose a 3-approximation algorithm 
Min-Star-Partition for a variant of the stage-2 problem. 

Finally, we conduct a comprehensive experimental 
study with two different applications (sentence translation 
and collaborative document editing) using real world data 
from Amazon Mechanical Turk and present rigorous scala¬ 
bility and quality analyses using synthetic data. Our experi¬ 
mental results demonstrate that our formalism is effective in 
aptly modeling the behavior of collaborative crowdsourcing 
and our proposed solutions are scalable. 

In summary, this work makes the following contributions: 

1. Formalism : We initiate the investigation of optimiza¬ 
tion opportunities in collaborative crowdsourcing, iden¬ 
tify and incorporate a variety of human factors in well 
formulated optimization problems. 

2. Algorithmic contributions : We present comprehensive 
theoretical analysis of our problems and approaches. 
We analyze the computational complexity of our prob¬ 
lems, and propose a principled staged solution. We 
propose exact instance optimal algorithms as well as 
efficient approximation algorithms with provable ap¬ 
proximation bounds. 

3. Experiments : We present a comprehensive set of exper¬ 
imental results (two real applications as well as syn¬ 
thetic experiments) that demonstrate the effectiveness 
of our proposed solutions. 

The paper is organized as follows. Sections [2] [3] an dg]d is- 
cuss a database application of collaborative crowdsourcing, 
our data model, problem formalization, and initial solutions. 
Sections [5] and [6] describe our theoretical analyses and pro¬ 
posed algorithmic solutions. Experiments are described in [7] 
related work in Section [8] and conclusion are presented in 
Section [9] Additional results are presented in appendix. 


2. AN APPLICATION 

Sentence translation ram is a frequently encoun¬ 
tered application of collaborative crowdsourcing, where the 
objective is to use the crowd to build a translation database 
of sentences in different languages. Such databases later on 
serve as the “training dataset” for supervised machine learn¬ 
ing algorithms for automated sentence translation purposes. 



Ui 

u 2 

u 3 

u 4 

u 5 

u 6 

di 

0.66 

1.0 

0.53 

0.0 

0.13 

0.0 

d 2 

0.0 

0.0 

0.66 

0.73 

0.66 

0.13 

d3 

0.0 

0.33 

0.53 

0.0 

0.8 

0.93 

Wage 

0.4 

0.3 

0.7 

0.8 

0.5 

0.8 


Table 1: Workers skill and wage table 


As a running example for this paper, consider a transla¬ 
tion task t designed for translating a English video clip to 
French. Typically, such translation tasks follows a 3-step 
process [47, 30 : English speakers first translate the video in 
English, professional editors edit the translation, and finally 
workers with proficiency in both English and French trans¬ 
late English to French. Consequently, such task requires 
skills in 3 different domains: English comprehension (di), 
English editing (cfe), and French Translation ability (cfe). 

In our optimization setting, each task t has a require¬ 
ment of minimum skill per domain and maximum cost bud¬ 
get, and workers should collabo rate with each other (e.g., 
to correct each others’ mistakes [47]), and the collaboration 
effectiveness is quantified as the affinity of the group. Some 
aspects of our formulation has similarities with team forma¬ 
tion problems in social networks (3]. The notion of affinity 
has been identified in the related work on sentence transla¬ 
tion tasks |47| [30] , as well as team formation problems [3]. 

However, if the group is “too large”, the effectiveness of 
collective actions diminishes 27, 39] while undertaking the 
translation task, as an unwieldy group of workers fail to 
find effective assistance from their peers 47, 30 . Therefore, 
each task t is associated with a corresponding upper criti¬ 
cal mass constraint on the size of an effective group, i.e., a 
large group may need to be further decomposed into multi¬ 
ple subgroups in order to satisfy that constraint. A study of 
the importance of the upper critical mass constraint in the 
crowdsourcing context, as well as how to set its (application- 
specific) value, are important challenges that are best left to 
domain experts; however, we experimentally study this issue 
for certain applications such as sentence translation. 

When this task arrives, imagine that there are 6 workers 
ui,U 2 , ■ ■ ■ ,U6 available in the crowdsourcing platform. Each 
worker has a skill value on each of the three skill domains 
described above, and a wage they expect. Additionally, the 
workers cohesiveness or affinity is also provided. These hu¬ 
man factors of the workers are summarized in Tables [T] and 
[2] and the task requirements of t (including thresholds on ag¬ 
gregated skill for each domain, total cost, and critical mass) 
are presented in Table [3] and are further described in the 
next section. 

The objective is to form a “highly cohesive” group Q of 
workers that satisfies the lower bound of skill of the task and 
upper bound of cost requirements. Due to the upper critical 
mass constraint, Q may further be decomposed into multiple 
subgroups. After that, each sub-group undertakes a subset 
of sentences to translate. Once all the subgroups finish their 
respective efforts, their contributions are merged. Therefore, 
both the overall group and its subgroups must be cohesive. 
Incorporation of upper critical mass makes our problem sig¬ 
nificantly different from the body of prior works [3], as we 
may have to create a group further decomposed into mutiple 
subgroups, instead of a single group. 















Ui 

u 2 

u 3 

u 4 

u 5 

u 6 

Ui 

0.0 

1.0 

0.66 

0.66 

0.85 

0.66 

u 2 

1.0 

0.0 

0.66 

0.85 

0.66 

0.85 

U 3 

0.66 

0.66 

0.0 

0.4 

0.66 

0.40 

u 4 

0.66 

0.85 

0.4 

0.0 

0.4 

0.0 

U 5 

0.85 

0.66 

0.66 

0.4 

0.0 

0.4 

U 6 

0.66 

0.85 

0.4 

0.0 

0.4 

0.0 


Table 2: Workers Distance Matrix 


Q 1 

Q 2 

Q 3 

C 

K 

1.8 

1.4 

1.66 

3.0 

3 


Table 3: Task Description 


3. DATA MODEL 

We introduce our data model and preliminaries that will 
serve as a basis for our problem definition. 

3.1 Preliminaries 

Domains: We are given a set of domains D — {d± , d 2 ,..., dm} 
denoting knowledge topics. Using the running example in 
Section [2] there are 3 different domains - English compre¬ 
hension (di), English editing (d 2 ), and French Translation 
ability(d 3 ). 

Workers: We assume a set U — {u±, U 2 ,..., u n } of n 
workers available in the crowdsourcing platform. The ex¬ 
ample in Section [2] describes a crowdsourcing platform with 
6 workers. 

Worker Group: A worker group Q consists of a subset 
of workers from U i.e. Q CU. 

Skills: A skill is the knowledge on a particular skill do¬ 
main in D , quantified in a continuous [0,1] scale. It is associ¬ 
ated with workers and tasks. The skill of a worker represents 
the worker’s expertise/ability on a topic. The skill of a topic 
represents the minimum knowledge requirement/quality for 
that task. A value of 0 for a skill reflects no expertise of a 
worker for that skill. For a task, 0 reflects no requirement 
for that skill. 

How to learn the skill of the workers is an important and 
independent research problem in its own merit. Most related 
work has relied on learning skill of the workers from “gold- 
standard” or benchmark datasets usin g pre-qualification tests 
10, 20 . As we describe in Section |7T| in detail, we also learn 
the skill of the workers by designing pre-qualification tests 
using benchmark datasets. 

Collaborative Tasks: A collaborative task t has the fol¬ 
lowing characteristics - a minimum knowledge threshold Qi 
per domain di in D, a maximum cost budget C for hiring 
workers to achieve t, and an upper critical mass K , denot¬ 
ing the maximum number of workers who can effectively 
collaborate inside a group to complete t. Specifically, t is 
characterized by a vector, (Qi, Q 2 , •.., Qm , C, K), of length 
m + 2. For the example in Section [2] there are 3 domains 
(m = 3) and their respective skill requirements, its cost C, 
and critical mass K of the task is described in Table OH A 
task is considered complete if it attains its skill requirement 
over all domains and satisfies all the constraints. 

3.2 Human Factors 

A worker is described by a set of human factors. We con¬ 
sider two types of factors - factors that describe individual 


worker’s characteristics and factors that characterize an in¬ 
dividual’s ability to work with fellow workers. Our contribu¬ 
tion is in appropriately adapting these factors in collabora¬ 
tive crowdsourcing from multi-disciplinary prior works such 
as team formation 3, 33 and psychology research [27i, 39 . 


3.2.1 Individual Human Factors: Skill and Wage 

Individual workers in a crowdsourcing environment are 
characterized by their skill and wage. 

Skill: For each knowledge domain di , G [0,1] is the 
expertise level of worker u in di. Skill expertise reflects the 
quality that the worker’s contribution has on a task accom¬ 
plished by that worker. 

Wage: w u £ [0,1] is the minimum amount of compen¬ 
sation for which a worker u is willing to complete a task. 
We choose a simple model where a worker specifies a single 
wage value independent of the task at-hand. 

Table [I] presents the respective skill of the 6 workers in 3 
different domains and their individual wages for the running 
example. 


3.2.2 Group-based Human Factors: Affinities 

Although related work in collaborative crowdsourcing ac¬ 
knowledges the importanc e of workers’ affinity to enable ef¬ 
fective collaboration 47, 30 , there is no attempt to for¬ 
malize the notion any further. A worker’s effectiveness in 
collaborating with her fellow workers is measured as affin¬ 
ity. We adopt an affinity model similar to group formation 
problems in social networks [34] [3 , where the atomic unit of 
affinity is pairwise , i.e., a measure of cohesiveness between 
every pair of workers. After that, we propose different ways 
to capture intra-group and inter-group affinities. 

Pairwise affinity: The affinity between two workers m 
and Uj, aff(ui,Uj), can be calculated by capturing the sim¬ 
ilarity between workers using simple socio-demographic at¬ 
tributes, such as region, age, gender, as done in previous 
work 47 , as well as more complex psychological character¬ 
istics 40 . For our purpose, we normalize pairwise affinity 
values to fit in [0,1] and use a notion of worker-worker dis¬ 
tance instead, i.e., where dist(ui,Uj) = 1 — aff(ui, uf). Thus 
a smaller distance between workers ensures a better collab¬ 
oration. Table [2] presents the pair-wise distance of all 6 
workers for running example in Section [2] As will be clear 
later, the notion of distance rathey than affinity enables the 
design of better algorithms for our purposes. 

Intra-group affinity: For a group Q, its intra-group 
affinity measures the collaboration effectiveness among the 
workers in Q. Here again we use distance and compute intra¬ 
group distance in one of two natural ways: computing the 
diameter of Q as the largest distance between any two work¬ 
ers in C/, or aggregating all-pair worker distances in Q: 

DiaDist(G) = Max\/ UijUje gdist(ui,Uj) 
SumDist(Q ) = T,\/ UijUj egdist(ui,Uj) 

For both definitions, smaller value is better. 

Inter-group affinity: When a group violates the upper 
critical mass constraint [27] , it needs to be decomposed into 
multiple smaller ones. The resulting subgroups need to work 
together to achieve the task. Given two subgroups G \, G 2 
split from a large group (?, their collaboration effectiveness 
is captured by computing their inter-group affinities. Here 
again, we use distance instead of affinity. More concretely, 
the inter-group distance is defined in one of two natural 


















ways: either the largest distance between any two workers 
across the sub-groups, or the aggregation of all pair-wise 
workers distances across subgroups: 

DiaInterDist(Gi, Gf) = Maxy Ui eG 1 ,u j eG 2 distfui, Uj) 
SumInterDist(Gi, G 2 ) = ^Vu i EG lt u j EG 2 dist(m, Uj) 

This can be generalized to more than two subgroups: if there 
are x subgroups, overall inter-group affinity is the summa¬ 
tion of inter-group affinity for all possible pairs ( x C<i ). 


4. OPTIMIZATION 

Problem Settings: For each collaborative task, we in¬ 
tend to form the most appropriate group of workers from 
the available worker pool. A collaborative crowdsourcing 
task has skill requirements in multiple domains and a cost 
budget, which is similar to the requirements of collaborative 
tasks in team formation problems [34]. Then, we adapt the 
“flat-coordination” models of worker interactions, which is 
considered important in prior works in team formation [ 3 ] 
as the “coordination cost”, or in collaborative crowdsourc¬ 
ing 47j itself, as The Turker-turker” affinity model. How¬ 
ever, unlike previous work, we attempt to fully explore the 
potential of “group synergy” 45 and how it yields the max¬ 


imum qualitative effects in group based efforts by maximiz¬ 
ing affinity among the workers (or minimizing distance). 
Finally, we intend to investigate the effect of upper criti¬ 
cal mass in the context of collaborative crowdsourcing as a 
constraint on group size, beyond which the group must be 
decomposed into multiple subgroups that are cohesive in¬ 
side and across. Indeed, our objective function is designed 
to form a group (or further decomposed into a set of sub¬ 
groups) to undertake a specific task that achieves the highest 
qualitative effect, while satisfying the cost constraint. 

(1) Qualitative effect of a group : Intuitively, the overall 
qualitative effect of a formed group to undertake a specific 
task is a function of the skill of the workers and their collab¬ 
oration effectiveness. Learning this function itself is chal¬ 
lenging, as it requires access to adequate training data and 
domain knowledge. In our initial effort, we therefore make a 
reasonable simplification, where we seek to maximize group 
affinity and pose quality as a hard constrain^ Existing lit¬ 
erature (indicatively [45]) informs us that aggregation is a 
mechanism that turns private judgments (in our case indi¬ 
vidual workers’ contributions) into a collective decision (in 
our case the final translated sentences), and is one of the four 
pillars for the wisdom of the crowds. For complex tasks like 
sentence translation or document editing, there is no widely 
accepted mathematical function of aggregation. We choose 
sum to aggregate the skill of the workers that must satisfy 
the lower bound of the quality of the task. This simplest 
and yet most intuitive functions for transforming individ¬ 
ual contributions into a col lective result has been adopted 
in many previous works 3, [34] [T 3 ]. Moreover, this simpler 
function allows us to design efficient algorithms. Explor¬ 
ing other complex functions (e.g., multiplicative function) 
or learning them is deferred to future work. 

(2) Upper critical mass : Sociolo gica l theories widely sup¬ 


port the notion of “critical mass” 27 39] by reasoning that 


large groups are less likely to support collective action. How¬ 
ever, whether the effect of “critical mass” should be imposed 


1 Notice that posing affinity as a constraint does not fully exploit 
the effect of “group synergy”. 


as a hard constraint, or it should have more of a gradual 
“diminishing return” effect, is itself a research question. For 
simplicity, we consider upper critical mass as a hard con¬ 
straint and evaluate its effectiveness empirically for differ¬ 
ent values. Exploring more sophisticated function to capture 
critical mass is deferred to future work. 


Problem 1. AffAware-Crowd: Given a collaborative 
task t, the objective is to form a worker group Q, further par¬ 
titioned into a set of x subgroups Gi, G 2 , ....GT (if needed) 
for the task t that minimizes the aggregated intra- distance of 
the workers, as well as the aggregated inter-distance across 
the subgroups of Q, and Q must satisfy the skill and cost 
thresholds of t, where each subgroup Gi must satisfy the up¬ 
per critical mass constraint of t. Of course, if the group Q 
itself satisfies the critical mass constraint, no further parti¬ 
tioning in Q is needed, giving rise to a single worker group. 

As explained above, quality of a task is defin ed a s an aggre¬ 
gation (sum) of the skills of the workers pi \3fl. Similarly, 
cost of the task is the additive wage of all the workers in Q . 

4.1 Optimization Models 

Given the high-level definition above, we propose multi¬ 
ple optimization objective functions based on different inter- 
and intra-distance measures defined in Section [3] 

For a group Q , we calculate intra-distance in one of the 
two possible ways: DiaDistQ, SumDistQ . If Q is further par¬ 
titioned to satisfy the upper critical mass constraint, then 
we also want to enable strong collaboration across the sub¬ 
groups by minimizing inter-distance. For the latter, inter¬ 
distance is calculated using one of DialnterDistQ, SumlnterDistQ . 
Even though there may be many complex formulations to 
combine these two factors, in our initial effort our overall ob¬ 
jective function is a simple sum of these two factors that we 
wish to minimize. This gives rise to 4 possible optimization 
objectives. 

• DiaDistQ, DialnterDistQ: 

Minimize {DiaDist{Q) + 

Max{WGi,Gj G 5 DiaInterDist(Gi,Gj)}} 


• SumDistQ, DialnterDistQ: 

Minimize { SumDist(Q ) + 

Max{\/Gi,Gj eG DiaInterDist(Gi,Gj)}} 


• DiaDistQ, SumlnterDistQ: 

Minimize {DiaDist(Q) + SumInterDist(Gi, Gj)} 

V G i ,G j eQ 

• SumDistQ, SumlnterDistQ: 

Minimize {SumDist(Q) + SumInterDist(Gi, Gj)} 

VGi'GjES 






where, each of these objective function has to satisfy the 
following three constraints on skill, cost, and critical mass 
respectively, as described below: 


S v Ui egUdi > Qi 
E G 
\Gi\<K 


V di 

Vz = {l,2,...,x} 


For brevity, the rest of our discussion only considers DiaDist () 
on intra-distance and SumlnterDist () on inter-distance. We 
refer to this variant of the problem as Aff Aware-Crowd. We 
note that our proposed optimal solution in Section [4] could 
be easily extended to other combinations as well. 


Theorem 1. Problem Aff Aware-Crowd is NP-hard 114]. 


The detailed proof is provided in the appendix inside Sec¬ 
tion [B] 


4.2 Algorithms for AffAware-Crowd 

Our optimization problem attempts to appropriately cap¬ 
ture the complex interplay among various important factors. 
The proof of Theorem [l] in Section [B] in the appendix shows 
that even the simplest variant of the optimization problem is 
NP-hard. Despite the computational hardness, we attempt 
to stay as principled as possible in our technical contribu¬ 
tions and algorithms design. Towards this end, we propose 
two alternative directions: (a) We investigate an integer lin¬ 
ear programming (ILP) 144] formulation to optimally solve 
our original overarching optimization problem. We note that 
even translating the problem to an ILP is non-trivial, be¬ 
cause the subgroups inside the large group are also unknown 
and are determined by the solution. ( b) Since ILP is pro¬ 
hibitively expensive (as our experimental results show), we 
propose an alternative strategy that is natural to our original 
formulation, referred to as Grp&Splt. Grp&Splt decomposes 
the original problem into two phases: in the Grp phase, a 
single group is formed that satisfies the skill and cost thresh¬ 
old, but ignores the upper critical mass constraint. Then, 
in the Spit phase, we partition this large group into a set 
of subgroups, each satisfying the upper critical mass con¬ 
straint, such that the sum of all pair inter-distance is mini¬ 
mized. Note that, for many tasks, the Grp stage itself may 
be adequate, and we may never need to execute Spit. We 
propose a series of efficient polynomial time approximation 
algorithms for each phase, each of which has a provable ap¬ 
proximation factor. Of course, this staged solution com¬ 
bined together may not have any theoretical guarantees for 
our original problem formulation. However, our experimen¬ 
tal results demonstrate that this formulation is efficient, as 
well as adequately effective. 


4.2.1 ILP for AffAware-Crowd 


minimize 


subject to 


V = Max{e i;i / x dist(ui, iq/)} + 

E E eijdist(ui,Uj) 

VG -£,Gj (zG \/ui(z.Gi,Uj£Gj 


n x 

EE X u\ >Qi VI € [1, m] 

i= 1 3 = 1 
n x 

EE U(i,Gj) Xwl<C 

i =1 3= 1 


E«<‘.°i> 

i =1 
x 

E“(^> 

3 =1 



< K 

Vj € [1, £c] 

< 1 

Vi € [1, n] 


3 j € [1, x] & u = 1 & = 1 

otherwise 


x E {0,1 ,..., n} 

£ {0,1} Vi E [1 ,n\,Vj e [1,as] 

( 1 ) 

We discuss the ILP next as shown in Equation [T] Let 
e^y) denote a boolean decision variable of whether a user 
pair Ui and u[ would belong to same sub-group in group Q or 
not. Also, imagine that a total of x groups (Gi, G 2 , •.., G x ) 
would be formed for task t, where 1 < x < n (i.e., at least the 
subgroup is Q itself, or at most n singleton subgroups could 
be formed). Then, which subgroup the worker pair should 
be assigned must also be determined, where the number of 
subgroups is unknown in the first place. Note that trans¬ 
lating the problem to an ILP is non-trivial and challenging , 
as the formulation deliberately makes the problem linear by 
translating each worker-pair as an atomic decision variable 
(as opposed to a single worker) in the formulation, and it 
also returns the subgroup where each pair should belong to. 
Once the ILP is formalized, we use a general-purpose solver 
to solve it. Although the Max operator in the objective func¬ 
tion (expresses DiaDist ()) must be translated appropriately 
further in the actual ILP implementation, in our formalism 
below, we preserve that abstraction for simplicity. 

The objective function returns a group of subgroups that 
minimizes DiaDist (Q) + Ev G G . SumlnterDist (Gi ,Gj). The 
first three constraints ensure the skill, cost and upper critical 
mass thresholds, whereas the last four constraints ensure the 
disjointedness of the group and the integrality constraints on 
different Boolean decision variables. 

When run on the example in Section [2] the ILP generates 
the optimal solution and creates group Q = {u±, U 2 , U 3 , u&} 

with two subgroups, G\ — {iq, U 2 , zm}, and G 2 — {^ 3 ,^ 6 }- 
The distance value of the optimization objective is 4.23, 
which equals to DiaDist(Q) + InterDist(G \, G 2 ). 


4.2.2 Grp&Splt: A Staged Approach 

Our proposed alternative strategy Grp&Splt works as fol¬ 
lows: in the Grp stage, we attempt to form a single worker 
group that minimizes DiaDist (Q), while satisfying the skill 
and cost constraints (and ignoring the upper critical mass 
constraint). Note that this may result in a large group, vio¬ 
lating the upper critical mass constraints. Therefore, in the 
Spit phase, we partition this big group into multiple smaller 


sub-groups, each satisfying the upper critical mass constraint 
in such a way that the aggregated inter-distance between all 
pair of groups T,\/ Gi Gj SumInterDist(Gi, Gj) is minimized. 
As mentioned earlier, there are three primary reasons for 
taking this alternative route: (a) In many cases we may not 
even need to execute Spit, because the solo group formed 
in Grp phase abides by the upper critical mass constraint 
leading to the solution of the original problem, (b) The 
original complex ILP is prohibitively expensive. Our exper¬ 
imental results demonstrate that the original ILP does not 
converge in hours for more than 20 workers, (c) Most impor¬ 
tantly, Grp&Splt allows us to design efficient approximation 
algorithms with constant approximation factors as well as 
instance optimal exact algorithms that work well in prac¬ 
tice, as long as the distance between the workers satisfies 
the metric property (triangle inequality in particular) [l2, 
[41 . We underscore that the triangle inequality assumption 
is not an overstretch, rather many natural distance mea¬ 
sures (Euclidean distance, Jaccard Distance) are metric and 
several other similarity measures, such as Cosine Similarity, 
Pearson and Spearman Correlations could be transformed 
to metric distance [46 . Furthermore, this assumption has 
been extensively used in distance computation in the related 
literature [ 2 ] 3 . Without metric property assumptions, the 
problems remain largely inapproximable 41 . 

5. ENFORCING SKILL & COST : GRP 

In this section, we first formalize our proposed approach in 
Grp phase, discuss hardness results, and propose algorithms 
with theoretical guarantees. Recall that our objective is 
to form a single group Q of workers that are cohesive (the 
diameter of that group is minimized), while satisfying the 
skill and the cost constraint. 

Definition 1. Grp: Given a task t, form a single group 
G of workers that minimizes DiaDist(Q), while satisfying 
the skill and cost constraints, i.e., Ti\/ ue gUd i > Qi,Vd i; & 
T,\/ ue gW u < C. 

Theorem 2. Problem Grp is NP-hard. 

The detailed proof is discussed in Section [5] in appendix. 

Proposed Algorithms for Grp: We discuss two algo¬ 
rithms at length - a) Opt Grp is an instance optimal algo¬ 
rithm. b) ApprxGrp algorithm has a 2 -approximation fac¬ 
tor , as long as the distance satisfies the triangle inequal¬ 
ity property. Of course, an additional optimal algorithm is 
the ILP formulation itself (referred to as ILPGrp in experi¬ 
ments), which could be easily adapted from Section E] Both 
Opt Grp and ApprxGrp invoke a subroutine inside, referred to 
as GrpCandidateSet (). We describe a general framework 
for this subroutine next. 

5.1 Subroutine GrpCandidateSet() 

Input to this subroutine is a set of n workers and a task t 
(in particular the skill and the cost constraints of t) and the 
output is a worker group that satisfies the skill and cost con¬ 
straints. Notice that, if done naively, this computation takes 
2 n time. However, Subroutine GrpCandidateSet () uses ef¬ 
fective pruning strategy to avoid unnecessary computations 
that is likely to terminate much faster. It computes a bi¬ 
nary tree representing the possible search space considering 
the nodes in an arbitrary order, each node in the tree is 
a worker u and has two possible edges (1/0, respectively 


stands for whether u is included in the group or not). A 
root-to-leaf path in that tree represents a worker group. 

At a given node u , it makes two estimated bound com¬ 
putation : a) it computes the lower bound of cost ( LBc ) of 
that path (from the root upto that node), b) it computes 
the upper bound of skill of that path (UBdf) for each do¬ 
main. It compares LBc with C and compares UBd i with 
Qi,\/di. If LBc > C or UBd i < Q% for any of the domains, 
that branch is fully pruned out. Otherwise, it continues the 
computation. Figure [l] has further details. 



Figure 1: A partially constructed tree of GrpCandidateSet () 
using the example in Section [ 2 ] At node u% = 1, LBc = ru U6 + 
w U4 + w U3 + w U5 + w Ul = 372 and UB dl = + 

u d 1 + u d + = 2.32. The entire subtree is pruned, since 

LB C (3.2)> C. 1 

ApprxGrp () uses this subroutine to find the first valid an¬ 
swer, whereas, Algorithm OptGrpO uses it to return all valid 
answers. 

5.2 Further Search Space Optimization 

When the skill and cost of the workers are arbitrary, a 
keen reader may notice that Subroutine GrpCandidateSet () 
may still have to explore 2 n potential groups at the worst 
case. Instead, if we have only a constant number of costs 
and arbitrary skills, or a constant number of skill values 
and any arbitrary number of costs, interestingly, the search 
space becomes polynomial. Of course, the search space is 
polynomial when both are constants. 

We describe the constant cost idea further. Instead of 
any arbitrary wage of the workers, we now can discretize 
workers wage apriori and create a constant number of k dif¬ 
ferent buckets of wages (a worker belongs to one of these 
buckets) and build the search tree based on that. When 
there are m knowledge domains, this gives rise to a total 
of mk buckets. For our running example in Section [2] for 
simplicity if we consider only one skill (gL), this would mean 
that we discretize all 6 different wages in k (let us assume 
k — 2) buckets. Of course, depending on the granularity 
of the buckets this would introduce some approximation in 
the algorithm as now the workers actual wage would be re¬ 
placed by a number which may be lesser or greater than the 
actual one. However, such a discretization may be realistic, 
since many crowdsourcing platforms, such as AMT, allow 
only one cost per task. 

For our running example, let us assume, bucket 1 repre¬ 
sents wage 0.5 and below, bucket 2 represents wage between 
0.5 and 0.8. Therefore, now workers uz,u±,uq will be part 
of bucket 2 and the three remaining workers will be part of 
bucket 1. After this, one may notice that the tree will nei¬ 
ther be balanced nor exponential. Now, for a given bucket, 
the possible ways of worker selection is polynomial (they will 



always be selected from most skilled ones to the least skilled 
ones), making the overall search space polynomial for a con¬ 
stant number of buckets. In fact, as opposed to 2 6 possible 
branches, this modified tree can only have (3 + 1) x (3 + 1) 
possible branches. Figure [2] describes the idea further. 

Once this tree is constructed, our previous pruning algo¬ 
rithm GrpCandidateSet () could be applied to enable further 
efficiency. 



Figure 2: Possible search space using the example in Section |3] 
after the cost of the workers are discretized into k = 2 buckets, 
considering only one skill d\. The tree is constructed in descend¬ 
ing order of skill of the workers per bucket. For bucket 1, if 
the most skilled worker U 2 is not selected, the other two workers 
(ui,us) will never be selected. 

5.3 Approximation Algorithm ApprxGrp 

A popular variant of facility dispersion problem fl2] 41 
attempts to discover a set of nodes (that host the facilities) 
that are a s far as possible, whereas, compact location prob¬ 
lems ll] attempt to minimize the diameter. For us, the 
workers are the nodes, and Grp attempts to find a worker 
group that minimizes the diameter, while satisfying the mul¬ 
tiple skills and a single cost constraint. We propose a 2- 
approximation algorithm for Grp, that is not studied before. 

Algorithm ApprxGrp works as follows: The main algo¬ 
rithm considers a sorted (ascending) list £ of distance values 
(this list represents all unique distances between the avail¬ 
able worker pairs in the platform) and performs a binary 
search over that list. First, it calls a subroutine (GrpDiaO) 
with a distance value a that can run at the most n times. 
Inside the subroutine, it considers worker u% in the z-th iter¬ 
ation to retrieve a star grap/^j centered around m that sat¬ 
isfies the distance a. The nodes of the star are the workers 
and the edges are the distances between each worker pair, 
such that no edge in that retrieved graph has an edge > a. 
One such star graph is shown in Figure [ 3 ] 

Next, given a star graph with a set of workers IA' , GrpDia 
invokes GrpCandidateSet (JA', t) to select a subset of workers 
(if there is one) from IA' , who together satisfy the skill and 
cost thresholds. GrpCandidateSet constructs the tree in the 
best-hrst-search manner and terminates when the first valid 
solution is found, or no further search is possible. If the cost 
values are further discretized, then the tree is constructed 
accordingly, as described in Section |5.2| This variant of 
ApproxGrp is referred to as Cons-k-Cost-ApproxGrp. 

Upon returning a non-empty subset IA" of IA ', 
GrpCandidateSet terminates. Then, ApprxGrp stores that a 
and associated 14" and continues its binary search over £ for 
a different a. Once the binary search ends, it returns that 
IA" which has the smallest a associated as the solution with 

2 Star graph is a tree on v nodes with one node having degree 
v — 1 and other v — 1 nodes with degree 1. 


Algorithm 1 Approximation Algorithm ApprxGrp () 

Require: IA , human factors for U. and task t 
1: List £ contains all unique distance values in increasing order 

2: repeat 

3: Perform binary search over £ 

4: For a given distance a, IA' = GrpDia(o, {Qi, Vdi}, C) 

5: if U' A 1 0 then 

6: Store worker group IA' with diameter d < 2a. 

7: end if 

8: until the search is complete 
9: return IA' with the smallest d 


the diameter upper-bounded by 2a, as long as the distance 
between the workers satisfy the triangle inequality] In case 
GrpDiaO returns an empty worker set to the main function, 
the binary search continues, until there is no more option in 
£. If there is no such U" that is returned by GrpDiaO , then 
obviously the attempt to find a worker group for the task t 
remains unsuccessful. 

The pseudo-code of the algorithm ApprxGrp () is presented 
in Algorithm [l] For the given task t using the example 
in Section [2] £ is ordered as follows: 0,0.4,0.66,0.85,1.0. 
The binary search process in the first iteration considers 
a = 0.66 and calls GrpDia (a, {Qi,Vdi}, C). In the first it¬ 
eration, GrpDiaO attempts to find a star graph (referred to 
Figure [3| with u\ as the center of the star. This returned 
graph is taken as the input along with the skill thresh¬ 
old of t inside GrpCandidateSet () next. For our running 
example, subroutine GrpDia(0.66, 1.8, 1.66, 1.4, 2.5) returns 
u±, ZZ 3 , ZZ 4 , uq. Now notice, these 4 workers do not satisfy the 
skill threshold of task t (which are respectively 1.8,1.66,1.4 
for the 3 domains.). Therefore, GrpCandidateSet (IA, t) re¬ 
turns false and GrpDiaO continues to check whether a star 
graph centered around U 2 satisfies the distance threshold 
0.66. Algorithm [2] presents the pseudocode of this subrou¬ 
tine. When run on the example in Section [2] ApprxGrp() 
returns workers ui,U 2 ,U 3 ,U 5 ,ue as the results with objec¬ 
tive function value upper bounded by < 2 x 0.66. 



Figure 3: An instantiation of GrpDia(0.66) using the exam¬ 
ple in Section [2] A star graph centered u\ is formed. 

Theorem 3. Algorithm ApprxGrp has a 2-approximation 
factor, as long as the distance satisfies triangle inequality. 

Lemma 1. Cons-k-Cost-ApproxGrp is polynomial. 

Both these proofs are elaborated in Section[B]in appendix. 

5.4 Optimal Algorithm OptGrp 

Subroutine GrpCandidateSet () leaves enough intuition 
behind to design an instance optimal algorithm that works 
well in practice. It calls subroutine GrpCandidateSet () with 

3 Without triangle in equa lity assumption, no theoretical guaran¬ 
tee could be ensured [41 . 







Algorithm 2 Subroutine GrpDiaO 

Require: Distance matrix of the worker set IA, distance a, task 
t. 

1: repeat 

2: for each worker u 

3: form a star graph centered at u, such that for each edge 

u,Uj , dist^^Uj) < a. Let U.' be the set of workers in the 
star graph. 

4: hi" — GrpCandidateSet QA',t) 

5: if U" / 0 then 

6: return U" 

7: end if 

8: until all n workers have been fully exhausted 
9: return U" = 0 


the actual worker set hi and the task t. For Opt Grp, the tree 
is constructed in depth-first-fashion inside GrpCandidateSet () 
and all valid solutions from the subroutine are returned to 
the main function. The output of OptGrp is that candi¬ 
date set of workers returned by GrpCandidateSet () which 
has the smallest largest edge. When run on the example in 
Section [2] this OptGrp returns Q — {u±,U 2 ,U 3 ,U 5 ,uq} with 
objective function value 1.0. 

Furthermore, when workers wages are discretized into k 
buckets, OptGrp could be modified as described in Section |5^2| 
and is referred to as Cons-k-Cost-OptGrp. 

Theorem 4- Algorithm OptGrp returns optimal answer. 

Lemma 2. Cons-k-Cost-OptGrp is polynomial. 

Both these proofs are described in Section [B] in appendix. 

6. ENFORCING UPPER CRITICAL MASS 
: SPLT 

When Grp results in a large unwieldy group Q that may 
struggle with collaboration, it needs to be partitioned fur¬ 
ther into a set of sub-groups in the Spit phase to satisfy 
the upper critical mass ( K ) constraint. At the same time, 
if needed, the workers across the subgroups should still be 
able to effectively collaborate. Precisely, these intuitions are 
further formalized in the Spit phase. 

Definition 2. Spit : Given a group Q, decompose it into 
a disjoint set of subgroups (Gi, G 2 ,..., G x ) such that\/i\Gi\ < 
K, JA \Gi\ = \Q\ and the aggregated all pair inter group dis¬ 
tance Dv g i ,G j eg SumInterDist(Gi , Gj) is minimized. 

Theorem 5. Problem Spit is NP-hard. 

The proof is described in Section [5] in appendix. 

Proposed Algorithm for Spit: Since the ILP for Spit 

can be very expensive, our primary effort remains in design¬ 
ing an alternative strategy that is more efficient, that allows 
provable bounds on the result quality. We take the following 
overall direction: imagine that the output of Grp gives rise 
to a large group Q with n' workers, where n' > K. First, 
we determine the number of subgroups x and the number 
of workers in each subgroup Gi. Then, we attempt to find 
optimal partitioning of the n' workers across these x sub¬ 
groups that minimizes the objective function. We refer to 
this as SpltBOpt which is the optimal balanced partitioning 
of Q. For the running example in Section [2] this would mean 
creating 2 subgroups, G\ and G 2 , with 3 workers in one and 
the remaining 2 in the second subgroup using the workers 
u\, U 2 , U 3 ,U 5 ,uq, returned by ApprxGrp. 


For the remainder of the section, we investigate how to 
find SpltBOpt. There are intuitive as well as logical rea¬ 
sons behind taking this direction. Intuitively, lower number 
of subgroups gives rise to overall smaller objective function 
value (note that the objective function is in fact 0 when 
x — T). More importantly, as Lemma [ 3 ] suggests, under cer¬ 
tain conditions, SpltBOpt gives rise to provable theoretical 
results for the Spit problem. Finding the approximation ra¬ 
tio of SpltBOpt for arbitrary number of partitions is deferred 
to future work. 

Lemma 3. SpltBOpt has 2-approximation for the Spit 
problem, if the distance satisfies triangle inequality, when 

* = r^i = 2. 

The proof is described in Section [B] in appendix. 

Even though the number of subgroups (aka partitions) 
is with K workers in all but last subgroup, finding an 
optimal assignment of the n workers across those subgroups 
that minimizes the objective function is NP-hard. The proof 
uses an easy reduction from 17!. We start by showing how 
the solution to SpltBOpt problem could be bounded by the 
solution of a slightly different problem variant, known as 
Min-Star problem [17] . 

Definition 3. Min-Star Problem: Given a group Q with 
n workers, out of which each of x workers (U\,U 2 , •.. ,u x ), 
represents a center of a star sub-graph (each sub-graph stands 
for a subgroup), the objective is to partition the remaining 
n — x workers into one of these x subgroups Gi, G 2 , ..., G x 
such that Yli=i kidist(ui , U j&Gj) + JA< • kikjdist(ui,Uj) is 
minimized, where ki is the total number of workers in sub¬ 
group Gi. 

Intuitively, Min-Star problem seeks to decompose the worker 
set into x subgroups, such that ui is the center of a star 
graph for subgroup Gi, and for a fixed set of such work¬ 
ers {ui, U 2 ,... ,u x }, the contribution of Ui to the objective 
function is proportional to the sum of distances of a star 
subgraph rooted at Ui. 

Solving Min-Star:Algorithm Min-Star-Partition: The 

pseudocode is listed in Algorithm [3] and additional details 
can be found in j 17 . The key insight behind this algorithm 
is the fact that for a fixed set of workers {u\,U 2 ,..., u x }, the 
second term of the objective function kikjdist(ui,Uj) 

is a constant. Furthermore, this expression could only take 
(™ ) distinct values corresponding to ah possible combina¬ 
tion of how the workers {u\,U 2 ,..., u x } are chosen from the 
group Q with n workers. Hence for a fixed set of work¬ 
ers, the objective now reduces to finding an optimal sub¬ 
groups G\,..., G x that minimizes the first expression. In¬ 
terestingly, this expression corresponds exactly to a special 
case of the popular transportation problem 115 that could 
be solved optimally with time complexity Ofn') [l7 . We 
refer to [l7] for further details. 

Finally, the objective function of the SpltBOpt is com¬ 
puted on the optimal partition of each instance of the trans¬ 
portation problem, and the one with the least value is re¬ 
turned as output. When run using Q — {u±, U 2 , U 3 , u&, uq} 
from ApprxGrp, this algorithm forms subgroups G\ — {u\, U 2 , 
and G 2 — {u 3 ,U 6 } with objective function value 3.89. 

Theorem 6 . Algorithm for Min-Star-Partition has a 3- 
approximation for SpltBOpt problem. 

Lemma 4. Min-Star-Partition is polynomial. 







Algorithm 3 Algorithm Min-Star-Partition 
Require: Group Q with n' workers and upper critical mass K 

1: G 

2: for all subset {iq,... ,u x } C Q do 

3: Find optimal subgroups {G i,..., G x } for {iq,by 

formulating it as transportation problem 
4: Evaluate objective function for {Gi,..., G x } 

5: end for 

6: return subgroups {Gi,..., G x } with least objective func¬ 
tion 


Both these proofs are described in Section [B] in appendix. 

7. EXPERIMENTS 

We describe our real and synthetic data experiments to 
evaluate our algorithms next. The real-data experiments 
are conducted at AMT. The synthetic-data experiments are 
conducted using a parametrizable crowd simulator. 

7.1 Real Data Experiments 

Two different collaborative crowdsourcing applications are 
evaluated using AMT. i) Collaborative Sentence Translation 
(CST), ii) Collaborative Document Writing (CDW). 

Evaluation Criteria: - The overall study is designed 
to evaluate: (1) Effectiveness of the proposed optimization 
model, (2) Effectiveness of affinity calculation techniques, 
and (3) Effect of different upper critical mass values. 

Workers: A pool of 120 workers participate in the sen¬ 
tence translation study, whereas, a different pool of 135 
workers participate in the second one. Hired workers are 
directed to our website where the actual tasks are under¬ 
taken. 

Algorithms: We compare our proposed solution with 
other baselines: (1) To evaluate the first criteria, Optimal 
algorithm (in Section |4| is compared against an alternative 
Aff-Unaware Algorithm [43]. The latter assigns workers to 
the tasks considering skill and cost but ignoring affinity. (2) 
Optimal-Affinity-Age and Optimal-Affinity-Region are 
two optimal algorithms that uses two different affinity cal¬ 
culation methods (Affinity-Age and Affinity-Region re¬ 
spectively) and are compared against each other to evaluate 
the second criteria. (3) CrtMass-Optimal-K assigns workers 
to tasks based on the optimization objective and varies dif¬ 
ferent upper critical mass values K , which are also compared 
against each other for different K. 

Pair-wise Affinity Calculation: Designing complex 
personality test [40] to compute affinity is beyond the scope 
of this work. We instead choose some simple factors to com¬ 
pute affinity that hav e be en acknowledged to be indicative 
factors in prior works 147]. We calculate affinity in two ways 
- 1) Affinity-Age: age based calculation discretizes work¬ 
ers in different age buckets and assigns a value of 1 to a 
worker-pair, if they fall under the same bucket, 0 otherwise. 
2) Affinity-Region: assigns a value of 1, when two workers 
are from the same country and 0 otherwise. We continue to 
explore more advanced affinity calculation methods in our 
ongoing work. 

Overall user-study design: The overall study is con¬ 
ducted in 3-stages : (1) Worker Profiling : in stage-1, we hire 
workers and use pre-qualification tests using “gold-data” to 
learn their skills. We also learn other human factors as 
described next.(2) Worker-to-task Assignment : in stage-2, 


a subset of these hired workers are re-invited to partici¬ 
pate, where the actual collaborative tasks are undertaken 
by them.(3) Task Evaluation : in stage-3, completed tasks 
are crowdsourced again to evaluate their quality. 

Summary of Results: There are several key takeaways 
of our user study results. First and foremost, effective col¬ 
laboration is central to ensuring high quality results for col¬ 
laborative complex tasks as demonstrated in Figure [4a] and 
Table [5] in appendix. Then, we evaluate 2 different affin¬ 
ity computation models in Figure |4b| and the results show 
that people from same region collaborate more effectively, 
as “correctness” of Optimal-Aff inity-Region outperforms 
Optimal-Affinity-Age. However, nothing could be said 
with statistical significance for the “completeness” dimen¬ 
sion. Both these dimensions are suggested to be indica¬ 
tive in prior works [47]. Interestingly, upper critical mass 
also has a significance in collaboration effectiveness, conse¬ 
quently, in the quality of the completed tasks, as shown in 
Figure [4c] Quality increases from K — 5 to K — 7, but 
it decreases with statistical significance when K — 10 for 
CrtMass-0ptimal-10. The final results of our collaborative 
document writing application presented in appendix in Ta¬ 
bleland in Section [Cl hold similar observations. 

7.1.1 Stage 1 - Worker Profiling 

We hire two different sets of workers for sentence transla¬ 
tion and document writing. The workers are informed that a 
subset of them will be invited (through email) to participate 
in the second stage of the study. 

Skill learning for Sentence Translation: We hire 60 
workers and present each worker with a 20 second English 
video clip, for which we have the ground truth translation 
in 4 different languages: English, French, Tamil, Bengali. 

We then ask them to create a translation in one of the lan¬ 
guages (from the last three) that they are most proficient in. 

We measure each workers individual skill using Word Error 
Rate(WER) [3l] . 

Skill learning for Document Writing: For the second 
study CDW , we hire a different set of 75 workers. We design 
a “gold-data” set that has 8 multiple choice questions per 
task, for which the answers are known (e.g. for the MOOCs 
topic - one question was, “ Who founded Coursera?”). The 
skill of each worker is then calculated as the percentage of 
her correct answers. For simplicity, we consider only one 
skill domain for both applications. 

Wage Expectation of the worker: We explicitly ask 
question to each worker on their expected monetary incen¬ 
tive, by giving them a high level description of the tasks 
that are conducted in the second stage of the study. Those 
inputs are recorded and used in the experiments. 

Affinity of the workers: Hired workers are directed to 
our website, where they are asked to provide 4 simple socio¬ 
demographic information: gender, age, region, and high¬ 
est education. Workers anonymity is fully preserved. From 
there, affinity between the worker is calculated using, Affinity-Age 
or Affinity-Region. 

Figure [18] and Figure [TT] in appendix contain detailed 
workers profile distribution information. 

7.1.2 Stage 2 - Worker-to-Task Assignment 

Once the hired workers are profiled, we conduct the sec¬ 
ond and most important stage of this study, where the actual 
tasks are conducted collaboratively. 

Collaborative Sentence Translation(CST): We carefully choose 







Task Name 

Skill 

Cost 

Critical Mass 

CST1- Destroyer 

3.0 

$5.0 

5,7,10 

CST2- German Weapons 

4.0 

$5.0 

5,7,10 

CST3 - British Aircraft 

3 

$4.5 

5,7,10 

CDW1- MOOCs 

5 

$3 

5,7,10 

CDW2- Smartphone 

5 

$3 

5,7,10 

CDW3- top-10 place 

5 

$3 

5,7,10 


Table 4: Description of different tasks; the default upper critical 
mass value is 5. Default affinity calculation is region based. 


three English documentaries of suitable complexity and length 
of about 1 minute for creating subtitle in three different lan¬ 
guages - French, Tamil, and Bengali. These videos are cho¬ 
sen from YouTube with titles: (1) Destroyer, (2) German 
Small Weapons, (3)British Aircraft TSR2. 

Collborative Document Writing (CDW): Three different 
topics are chosen for this application: 1) MOOCs and its 
evolution, 2) Smart Phone and its evolution, 3) Top-10 places 
to visit in the world. 

For simplicity and ease of quantihcation, we consider that 
each task requires only one skill (ability to translate from 
English to one of the three other languages for CST, and ex¬ 
pertise on that topic for CDW). The skill and cost require¬ 
ments of each tasks are described in the Table [4] These 
values are set by involving domain experts and discussing 
the complexity of the tasks with them. 

Collaborative Task Assignment for CST: We set up 
2 different worker groups per task and compare two algo¬ 
rithms Optimal-CST Aff-Unaware-CST to evaluate the ef¬ 
fectiveness of proposed optimization model. We set up ad¬ 
ditional 2 different worker groups for each task to compare 
Optimal-Affinity-Region with Optimal-Affinity-Age. Fi¬ 
nally, we set up 3 additional groups per task to compare the 
effectiveness of critical mass and compare CrtMass-0ptimal-5, 
CrtMass-0ptimal-7, CrtMass-0ptimal-10. This way, a to¬ 
tal of 15 groups are created. We instruct the workers to work 
incrementally using other group members contribution and 
also leave comment as they finish the work. These sets of 
tasks are kept active for 3 days. 

Collaborative Task Assignment for CDW: An sim¬ 
ilar strategy is adopted to collaboratively edit a document 
within 300 words, using the quality, cost, and critical mass 
values of the document editing tasks, described in Table [4] 
Workers are suggested to use the answers of the Stage-1 
questionnaires as a reference. 

7.1.3 Stage 3 - Task Evaluation 

Collaborative tasks, such as knowledge synthesis, are of¬ 
ten subjective. An appropriate technique to evaluate their 
quality is to leverage the wisdom of the crowds. This way 
a diverse and large enough group of individuals can accu¬ 
rately evaluate information to nullify individual biases and 
the herding effect. Therefore, in this stage we crowdsource 
the task evaluation for both of our applications. 

For the first study Sentence Translation (CST), we have 
taken 15 final outcomes of the translation tasks as well as the 
original video clips and they are set up as 3 different HITs 
in AMT. The first HIT is designed to evaluate the optimiza¬ 
tion model, the second one to evaluate two different affinity 
computation models, and the final one to evaluate the ef¬ 
fectiveness of upper critical mass. We assign 20 workers in 
each HIT, totaling 60 new workers. Completed tasks are 


asked to evalu ate in two quality dimensions, as identified 
by prior work 47 - 1. correctness of translation. 2.com¬ 
pleteness of translation. The workers are asked to rate the 
quality in a scale of 1 — 5 (higher is better) without knowing 
the underlying task production algorithm. Then, we average 
these ratings which is similar to obtaining the viewpoint of 
the average readers. The CST results of different evaluation 
dimensions are presented in Figure [4] 

A similar strategy is undertaken for the CDW applica¬ 
tion, but the quality is assessed using 5 key different quality 
aspects, as proposed in prior work [6]. For lack of space, 
we present a subset of these results in Section [C] of the ap¬ 
pendix in Table [5] Both these results indicate that, indeed, 
our proposed model successfully incorporates different ele¬ 
ments that are essential to ensure high quality in collabora¬ 
tive crowdsourcing tasks. 

7.2 Synthetic Data Experiments 

We conduct our synthetic data experiments on an Intel 
core 15 with 6 GB RAM. We use IBM CPLEX 12.5.1 for the 
ILP. A crowd simulator is implemented in Java to generate 
the crowdsourcing environment. Ah numbers are presented 
as the average of three runs. 

Simulator Parametrization: The simulator parame¬ 
ters presented below are chosen akin to their respective dis¬ 
tributions, observed in our real AMT populations. 

1. Simulation Period - We simulate the system for a time 
period of 10 days, i.e. 14400 simulation units, with each 
simulation unit corresponding to 1 minutes. With one task 
arriving in every 10 minutes, our default setting runs 1 day 
and has 144 tasks. 

2. # of Workers - default is 100, but we vary \U\ upto 5000 
workers. 

3. Workers skill and wage - The variable ua i in skill di 
receives a random value from a normal distribution with 
the mean set to 0.8 and a variance 0.15. Worker’s wages are 
also set using the same normal distribution. 

4. Task profile - The task quality Qi , as well as cost C is 
generated using normal distribution with specific mean 15 
and variance 1 as default. Unless otherwise stated, each task 
has a skill. 

5. Distance - Unless otherwise stated, we consider distance 
to be metric and generated using Euclidean distance. 6. 
Critical Mass - the default value is 7. 

7. Worker Arrival, Task Arrival - By default, both workers 
and tasks arrive following a Poisson process, with an arrival 
rate of fi — 5/minute 1/10 minute, respectively. 

Implemented Algorithms: 1. Overall-ILP: An ILP, 
as described in Section [4] 

2. Grp&Splt : Uses ApprxGrp for Grp and Min-Star-Partition 
for Spit. 

3. Grp’&Greedy: An alternative implementation. In phase- 
1, we output a random group of workers that satisfy skill and 
cost threshold. In phase-2, we partition users greedily into 
most similar subgroups satisfying critical mass constraint. 

4. Cons-k-Cost-ApprxGRP/Cons-k-Cost-OptGRP : with k — 

15 as default, as discussed in Section |5.3| and Section |5.4[ 
respectively. 

5. GrpILP: An ILP for Grp. 

6. No implementation of existing related work: Due to crit¬ 
ical mass constraint, we intend to form a group, further 
partitioned into a set of subgroups, whereas, no prior work 
has studied the problem of forming a group along with sub¬ 
groups, thereby making our problem and solution unique. 















Translation 


Language 


The destroyers are among the fastest and most deadly worships 
ever built. Mounting a powerful ? of offensive and defensive 
weapons, they can serve equally well as escorts for other vessels 
more in form of a ? in their own right. 

Les destroyers sont parmi les plus rapides et les plus meurtrieres 
jamais construits. Montage d'un ? puissant offensives et 
defensives, ils peuvent tout aussi bien servir d'escortes pour les 
autres navires. Au debut, les navires etais corpus exclusivement 
pour detruire les bateaux ?. 

Les destructeurs sont parmi les plus rapides et les plus meurtriers 
jamais construits. Montage d'un puissant arsenal d'armes 
defensives et offensives, ils peuvent tout aussi bien servir 
d'escorte aux autres navires, plus sous forme de (formidable 
navire d'attaque?) dans leur propre droit. Au debut, les navires 
etaient corpus exclusivement pour detruire les bateaux Paxon. 








(a) Optimization Model ( b ) Affini V Calculation 


(c) Upper Critical Mass 

(d) A French Translation Sample 


Figure 4: Stage 3 results of sentence translation: Collected data with statistical significance (standard error) is presented. These 
results clearly corraborate that our affinity-aware optimization model Optimal-CST outperforms its affinity-unaware counterpart [43] 
with statistical significance across both quality dimensions.Optimal-Affinity-Region apperas to outeprform Optimal-Affinity-Age in 
“correctness”. The results of CrtMass-0ptimal-10 clearly appers to be less effective than the other two, showing some anecdotal evidence 
that group size is important in collaborative crowdsourcing applications. 
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Summary of Results: Our synthetic experiments also 
exhibit many interesting insights. First and foremost, Grp&Splt 
is a reasonable alternative formulation to solve Aff Aware-Crowd, 
both qualitatively and efficiency-wise, as Overall-ILP is 
not scalable and does not converge for more than 20 work¬ 
ers. Second, our proposed approximation algorithms for 
Grp&Splt are both efficient as well as effective, and they 
significantly outperform other competitors. Finally, our pro¬ 
posed formulation Aff Aware-Crowd is an effective way to op¬ 
timize complex collaborative crowdsourcing tasks in a real 
world settings. We first present the overall quality and seal- 
ability of the combined Grp&Splt, followed by that of Grp 
individually. Individual Spit experiments are along the ex¬ 
pected lines (our approach better than ILP, quality closer 
to optimal), and we omit those results for brevity. 

7.2.1 Quality Evaluation 

We present the quality evaluations next. 

7 . 2 . 7.7 Grp&Splt Quality. 

The average of overall objective function value, which is 
the sum of DiaDist(G) and aggregated all pair SumlnterDist () 
across the subgroups, is evaluated and presented as mean ob¬ 
jective function value for 144 tasks. Overall-ILP does not 
converge beyond 20 workers. 

Varying # of Workers: Figure [5] has the results, with 
mean skill=15 and variance=l, demonstrates that Grp&Splt 
outperforms Greedy-Partition in all the cases, while being 
very comparable with Overall-ILP. 

Varying Tasks Mean Skill: With varying mean skill (cost 
is proportional to skill), Figure [6] demonstrates that the ob¬ 
jective function gets higher (hence worse) for both the algo¬ 
rithms, as skill/cost requirement increases, while Grp&Splt 
outperforms Grp 5 &Greedy. This intuitively is meaningful, 
as with increasing skill requirement, the generated group is 


large, which decreases the workers cohesiveness further. 
Varying Critical Mass: As Figure [ 7 ] shows, with increas¬ 
ing critical mass, quality of both solutions increases, be¬ 
cause the aggregated inter-distance across the partition gets 
smaller due to less number of edges across. 

Varying Simulation Period: In Figure [8] simulation pe¬ 
riod is varied, where both workers and tasks arrive based on 
Poisson process. Grp&Splt convincingly outperforms 
Grp ; &Greedy in this experiment. 

Varying # cost buckets: We also ran experiments vary¬ 
ing the number of cost buckets for 

Cons-k-Cost-ApprxGRP/Cons-k-Cost-OptGRP. With increas¬ 
ing k , the objective function gets slightly better in general. 

7.2.1.2 Grp Phase Quality. 

The objective function is the average DiaDistO value. 
Varying Task Mean Skill: Figure [9] demonstrates that, 
although GrpApprx is 2-times worse than optimal theoreti¬ 
cally, but in practice, it is as good as optimal GrpILP. 
Varying Simulation Period: FigurepU]demonstrates, that, 
as more workers are active in the system GrpILP cannot con¬ 
verge. Hence, we can not get the results for GrpILP beyond 
day-2. But, GrpApprx works fine and achieves almost opti¬ 
mal result. 

7 . 2.2 Efficiency Evaluation 

In this section, we demonstrate the scalability aspects of 
our proposed algorithms and compare them with other com¬ 
petitive methods by measuring the average completion time 
of a task. Like above, we first present the overall time for 
both Grp and Spit phase, followed by only Grp phase. 

7 . 2 . 2.7 Grp&Splt Efficiency. 

Varying # Workers: Figure E3 demonstrates that our 
solution Grp&Splt is highly scalable, whereas, Overall-ILP 
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Completion Time varying Sim- tion Time varying Simulation 
ulation Days Days 


fails to converge beyond 20 workers. Grp'&Greedy is also 
scalable (because of the simple algorithm in it), but clearly 
does not ensure high quality. 

Varying Task Mean Skill: Akin to previous result, Grp&Splt 
and Grp 5 &Greedy are both scalable, Grp&Splt achieves higher 
quality. We omit the chart for brevity. 

Varying Critical Mass: As before, increasing critical mass 
leads to better efficiency for the algorithms. We omit the 
chart for brevity. 

Varying Simulation Period: Figuredemonstrates that 
Grp&Splt is highly scalable in a real crowdsourcing environ¬ 
ment, where more and more workers are entering into the 
system. The results show that Grp'&Greedy is also scalable 
(but significantly worse in quality). But as number of worker 
increases, efficiency decreases, for both, as expected. 

7.2.2.2 Grp Phase Efficiency. 

We evaluate the efficiency of ApprxGrp by returning mean 
completion time for 144 tasks. 

Varying Task Mean Skill: As Figure [12] demonstrates, 
ApprxGrp outperforms GrpILP significantly. With higher 
skill threshold, the difference becomes even more noticeable. 
Varying Simulation Period: Figure [14] shows the average 
task completion time in each day for ApprxGrp,GrpILP, 
Grp 5 &Greedy. Clearly, GrpILP is impractical to use as more 
workers arrive in the system. 

8. RELATED WORK 

While no prior work has investigated the problem we study 
here, we discuss how our work is different from a few existing 
works that discuss the challenges in crowdsourcing complex 
tasks, as well as traditional team formation problems. 

Crowdsourcing Complex Tasks: This type of human 
based computation [29] [28] handles tasks related to knowl¬ 
edge production, such as article writing, sentence transla¬ 
tion, citizen science, product design, etc. These tasks are 
conducted in groups, are less decomposable compared to 
micro-tasks (such as image tagging) |16[ |21| , and the qual¬ 
ity is measured in a continuous, rather than binary scale. 


A number of crowdsourcing tools are designed to solve ap¬ 
plication specific complex tasks. Soy lent uses crowdsourcing 
inside a word processor to improve the quality of a written 
article |5j. Legion , a real time user interface, enables inte¬ 
gration of multiple crowd workers input at the same time 
35^. Turkit provides an interface to programmer to use hu¬ 
man computation inside their programming model [37] and 
avoids redundancy by using a crash and return model which 
uses earlier results from the assigned tasks. Jabberwocky is 
another platform which leverages social network information 
to assign tasks to workers and provide an easy to use inter¬ 
face for the programmers 1 . CrowdForge divides complex 
task into smaller sub-tasks akin to map-reduce fashion 30 


Turkomatic introduces a framework in which workers aid re¬ 
questers to break down the workflow of a complex task and 
thereby aiding to solve it using systematic steps [32] . 

Unfortunately, these related work are very targeted to spe¬ 
cific applications and no one performs optimization based 
task assignment, such as ours. A preliminary work discusses 
modular team structures for complex crowdsourcing tasks, 
detailing however more on the application cases, and not 
on the computational challenges [9]. One prior work inves¬ 
tigates how to assign workers to the task for knowledge in¬ 
tensive crowdsourcing [43] and its computational challenges. 
However, this former work does not investigate the necessity 
nor the benefit of collaboration. Consequently, the problem 
formulation and the proposed solutions are substantially dif¬ 
ferent from the one studied here. 

Automated Team Formation: Although tangentially 
related with crowdsourcing, automated team formation is 
widely studied in computer assisted cooperative systems. 
[34| forms a team of experts in social networks with the fo¬ 
cus of minimizing coordination cost among team members. 
Although their coordination cost is akin to our affinity, but 
unlike us, the former does not consider multiple skills. Team 
formation to balance workload with multiple skills is stud¬ 
ied later on in [ 5 ] and multi-objective optimization on co¬ 
ordination cost and balancing workload is also proposed [3| 
|38| , where coordination cost is posed as a c onstraint. Den¬ 
sity based coordination is introduced in 13 , where multiple 
workers with similar skill are required in a team, such as 
ours. Formation of team with a leader (moderator) is stud¬ 
ied in 122 . Minimizing both communicatio n co st and budget 
while forming a team is first considered in [23[ [24] . The con¬ 
cept of pareto optimal groups related to the skyline research 
is studied in 23 . 

While several elements of our optimization model are ac¬ 
tually adapted from these related work, there are many stark 
differences that precludes any easy adaptation of the team 
formation research to our problem. Unlike us, none of these 
works considers upper critical mass as a group size con- 
































straint, that forms a group multiple subgroups, which makes 
the former algorithms inapplicable in our settings. Addition¬ 
ally, none of these prior work studies our problem with the 
objective to maximize affinity with multiple skills and cost 
constraints. In [8], authors demonstrate empirically that 
the utility is decreased for larger teams which validates our 
approach to divide group into multiple sub-groups obeying 
upper critical mass. However, no optimization is proposed 
to solve the problem. 

In summary, principled optimization opportunities for com¬ 
plex collaborative tasks to maximize collaborative effective¬ 
ness under quality and budget constraints is studied for the 
first time in this work. 

9. CONCLUSION 

We initiate the study of optimizing “collaboration” that 
naturally fits to many complex human intensive tasks. We 
make several contributions: we appropriately adapt various 
individual and group based human factors critical to the 
successful completion of complex collaborative tasks, and 
propose a set of optimization objectives by appropriately 
incorporating their complex interplay. Then, we present 
rigorous analyses to understand the complexity of the pro¬ 
posed problems and an array of efficient algorithms with 
provable guarantees. Finally, we conduct a detailed experi¬ 
mental study using two real world applications and synthetic 
data to validate the effectiveness and efficiency of our pro¬ 
posed algorithms. Ours is one of the first formal investiga¬ 
tions to optimize collaborative crowdsourcing. Conducting 
even larger scale user studies using a variety of objective 
functions is one of our ongoing research focus. 
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B. PROOFS OF THE THEOREMS AND LEM 
MAS 

Proof: Theorem 1 - Aff Aware-Crowd is NP-hard. 

Proof. Sketch: Given a collaborative task t and a set 
of users U. and a real number value X , the decision version 
of the problem is, whether there is a group Q (further par¬ 
titioned into multiple subgroups) of users (Q C U ), such 
that the aggregated inter and intra distance value of Q is X 
and skill, cost, and critical mass constraints of t are satis¬ 
fied. The membership verification of the decision version of 
Aff Aware-Crowd is clearly polynomial. 

To prove NP-hardness, we consider a variant of compact 
location [Tllj problem which is known to be NP-Complete. 
Given a complete graph G with N nodes, an integer n < N 
and a real number X ', the decision version of the problem 
is whether there is a complete sub-graph g of size n E IV, 
such that the maximum distance between between any pair 
of nodes in g' is X'. This varia nt o f the compact location 
problem is known as Min-DIA in 11]. 

Our NP-hardness proof uses an instance of Min-DIA and 
reduces that to an instance of Aff Aware-Crowd problem in 
polynomial time. The reduction works as follows: each node 
in graph G represents a worker u , and the distance between 
any two nodes in G is the distance between a pair of workers 
for our problem. We assume that the number of skill domain 
is 1, i.e., m — 1. Additionally, we consider that each workers 
u has same skill value of 1 on that domain, i.e., Ud = 1 , Vr 
and their cost is 0, i.e., w u = 0, Vr. Next, we describe the 
settings of the task t. For our problem, the task also has 
the quality requirement in only one domain, which is, Q±. 
The skill, cost, and critical mass of t are, (Q i = n',C = 
0, K = oo). This exactly creates an instance of our problem 
in polynomial time. Now, the objective is to form a group Q 
for task t such that all the constraints are satisfied and the 
objective function value of Aff Aware-Crowd is X' , such that 
there exists a solution to the Min-DIA problem, if and only 
if, a solution to our instance of Aff Aware-Crowd exists. □ 

Proof: Theorem 2 - Grp is NP-hard. 

Proof. Sketch: Given a collaborative task t with criti¬ 
cal mass constraint and a set of users U and a real number 
X , the decision version of the problem is, whether there is 
a group Q of users (Q C U), such that the diameter is X , 
and skill and cost constraints of t are satisfied.The mem¬ 
bership verification of this decision version of Grp is clearly 
polynomial. 

To prove NP-hardness, the follow the similar strategy as 
above. We use an instance of Min-DIA TT] and reduce that 
to an instance of Grp, as follows: each node in graph G of 
Min-DIA represents a worker u , and the distance between any 
two nodes in G is the distance between a pair of workers for 
our problem. We assume that the number of skill domain is 


u6 



Figure 15: An instantiation of GrpDia(0.66) using the example 
in Section [ 2 ] The clique involving u±, U 3 , U 4 , uq can not have 
an edge with distance > 2 x 0 . 66 , due to the triangle inequality 
property. 

1, i.e., mm 1. Additionally, we consider that each workers u 
has the same skill value of 1 on that domain, i.e., ua — 1, \/u 
and their cost is 0, i.e., w u = 0,Vu. Task t has quality 
requirement on only one domain, which is, Q±. The skill 
requirement of t is (Q 1 = n and cost (7 = 0). Now, the 
objective is to form a group Q for task t such that the skill 
and cost constraints are satisfied with the diameter of Grp as 
X' , such that there exists a solution to the Min-DIA problem, 
if and only if, a solution to our instance of Grp exists. □ 

Proof: Theorem 3 - Algorithm ApprxGrp has a 2- 
approximation factor, as long as the distance satis¬ 
fies triangle inequality. 

Proof. Algorithm ApprxGrp overall works as follows: it 
sorts the distance values in ascending fashion to create a 
list C and performs a binary search over it. For a given dis¬ 
tance value a, it makes a call to GrpDia(o) . Recall Figure [3] 
that forms a star graph centered on u\ with GrpDia(0.66j 
using the example in Section [2] Consider Figure p~5| and no¬ 
tice that for a given distance value = 0 , the complete graph 
induced by the star graph can not have any edge that is 
larger than 2 x o, as long as the distance satisfies the tri¬ 
angle inequality property. Therefore, when GrpDia(o) re¬ 
turns a non-empty worker set (that only happens when the 
skill and cost thresholds are satisfied), then, those work¬ 
ers satisfies the skill and cost threshold with the optimiza¬ 
tion objective value of < 2a. Next, notice that algorithm 
ApprxGrp overall attempts to return the smallest distance 
c7 for which GrpDia(a7 ) returns a non-empty set, as it per¬ 
forms a binary search over the sorted list of distance values 
(where distance is sorted in smallest to largest). Therefore, 
any group of workers returned by ApprxGrp satisfies the skill 
and cost threshold value and DiaDist(G) is at most 2-times 
worse than the optimal. Hence the approximation factor 
holds. □ 

Proof : Lemma [l] - Cons-k-Cost-ApproxGrp is poly¬ 
nomial. 

Proof. Under a constant number of k- costs, subroutine 
GrpCandidateSet () will accept a polynomial computation 
time of 0 (p + l) m/e at the worst case, where p is the maxi¬ 
mum number of workers in one of the k buckets (p = O(n)). 
Subroutine GrpDiaO runs for all n workers at the worst 
case, and there is a maximum number of log 2 \C\ calls to 
GrpDiaO from the main function (|£| = 0(n 2 )). Therefore, 


the asymptotic complexity of Cons-k-ApproxGrp is 0(n x 
I 0 Q 21 £\ x (p + l) mA: ), which is polynomial. □ 

Proof: Theorem 4 - Algorithm OptGrp returns op¬ 
timal answer. 

Proof, sketch: Algorithm OptGrp invokes the subroutine 
GrpCandidateSet (). Notice that GrpCandidateSet () oper¬ 
ates in the spirit of the branch-and-bound technique [36] 
to efficiently explore the search space and avoid unneces¬ 
sary computations. GrpCandidateSet () exploits the upper 
bound of cost and lower bound of skill to prune out all unnec¬ 
essary branches of the search tree, as shown in Figure [l] and 
Figure [2] However, this subroutine returns all valid worker 
groups to OptGrp, and then, the main function selects the 
group with the smallest longest edge (i.e., smallest diame¬ 
ter value), and minimizes the objective function. Therefore, 
OptGrp is instance optimal, i.e., it returns the group of work¬ 
ers with the smallest diameter distance, while satisfying the 
skill and cost threshold. Therefore, OptGrp returns optimal 
answer. □ 

Proof : Lemma 2 - Cons-k-Cost-OptGrp is polyno¬ 
mial. 

Proof. Under a constant number of k- costs, subroutine 
GrpCandidateSet () will accept a polynomial computation 
time of 0{n + l) mfc at the worst case. Once the subroutine 
returns all valid answers, the main function will select the 
one that has the smallest diameter. Therefore, the computa¬ 
tion time of Cons-k-Cost-OptGrp is dominated by the com¬ 
putation time of the subroutine GrpCandidateSet (). There¬ 
fore, Algorithm Cons-k-OptGrp runs in polynomial time of 
0 ((p+l) mk . □ 

Proof: Theorem 5 - Problem Spit is NP-hard. 

Proof. Given a group <J, an upper critical mass con¬ 
straint A, and a real number X , the decision version of the 
Spit is whether Q can be decomposed to a set of subgroups 
such that the aggregated distances across the subgroups is 
X and the size of each subgroup is < K. The membership 
verification of Spit is clearly polynomial. 

To prove NP-hardness, we reduce the Minimum Bisection 25^ 
which is known to be NP-hard to an instance of Spit prob¬ 
lem. 

Given a graph (7(U, E ) with non-negative edge weights 
the goal of Minimum Bisection problem is to create 2 non¬ 
overlapping partitions of equal size, such that the total weight 
of cut is minimized. The hardnes s of the problem remains, 
even when the graph is complete [25] . 

Given a complete graph with n riodes, the decision ver¬ 
sion of the Minimum Bisection problem is to see whether 
there exists a 2 partitions of equal size, such that the total 
weight of the cut is X '. We reduce an instance of Minimum 
Bisection to an instance of Spit as follows: the complete 
graph represents the set of workers with non-negative edges 
as their distance and we wish to decompose this group to 
two sub-groups, where the upper critical mass is set to be 
K — n / 2. Now, the objective is to form the sub-groups 
with the aggregated inter-distance of X' , such that there 
exists a solution to the Minimum Bisection problem, if and 
only if, a solution to our instance of Spit exists. □ 

Proof: Lemma 3 - SpltBOpt has 2-approximation 
for the Spit problem, if the distance satisfies triangle 
inequality, when x— \j^~\ =2 . 




a a 



Figure 16: Balanced Partitioning in SpltBOpt when the dis¬ 
tance satisfies triangle inequality for a graph with 6 modes. The 
left hand side figure has two partitions ({a, b , c}, {d, e, /}) with 3- 
nodes in each (red nodes create one partition and blue nodes cre¬ 
ate another). The intra-partion edges are drawn solid, whereas, 
inter-partition edges are drawn as dashed. Assuming K — 4, in 
the right hand side figure, node d is moved with a, 6, c. This in¬ 
creases the overall inter-partition weights, but is bounded by a 
factor of 2. 


Proof. Sketch: For the purpose of illustration, imagine 
that a graph with n' nodes is decomposed into two parti¬ 
tions. Without loss of generality, imagine partition-1 has 
ni nodes and partition-2 has n 2 nodes, where m + 712 = n 
with total weight of w '. Let K be the upper critical mass 
and assume that K > ni,K > 77 - 2 . For such a scenario, 
SpltBOpt will move one or more nodes from the lighter par¬ 
tition to the heavier one, until the latter has exactly K nodes 
(if both partitions have same number of nodes then it will 
choose the one which gives rise to overall lower weight). No¬ 
tice, the worst case happens, when some of the intra-edges 
with higher weights now become inter edges due to this bal¬ 
ancing act. Of course, some inter-edges also gets knocked 
off and becomes intra-edges. It is easy to notice that the 
number of inter-edges that gets knocked off is always larger 
than that of the number of inter-edges added (because the 
move is always from the lighter partition to the heaver one). 
The next argument we make relies heavily on the triangle 
inequality property. At the worst case, every edge that gets 


added due to balancing, could at most be twice the weight 
of an edge that gets knocked off. Therefore, an optimal so¬ 
lution of SpltBOpt has 2-approximation factor for the Spit 
problem. 

An example scenario of such a balancing has been illus¬ 
trated in Figure [T6] where m = ri 2 = 3, K = 4. Notice that 
after this balancing, three inter-edges get deleted (ad,bd,cd), 
each of weight a and two inter-edges get added, where each 
edge is of weight 2a. However, the approximation factor of 
2 holds, due to the triangle inequality property. □ 


Proof: Theorem 6 - Algorithm for Min-Star-Partition 
has a 3-approximation for SpltBOpt problem. 

Proof, sketch: This result is a direct derivation of the 
previous work 17 . Previous work [ 17] shows that Min-Star-Partition 
obtains a 3-approximation factor for the Minimum k-cut prob¬ 
lem. Recall that SpltBOpt is derived from Minimum k-cut 
by setting each partition size (possibly except the last one) 
to be equal with K nodes, giving rise to a tot al nu mber of 
\jf\ partitions. After that, the result from 
holds. □ 


17 


directly 


Proof: Lemma 4 - Min-Star-Partition is polyno¬ 
mial. 

Proof. It can be shown that Min-Star-Partition takes 
0 {n ,x+1 ) time, as there are 0 (n x ) distinct transportation 
problem instances (corresponding to each one of (Mcombi- 
nations), and each instance can be solved in 0 (n) 17 time. 
Since, x is a constant, therefore, the overall running time is 
polynomial. □ 


C. USER STUDY DETAILS 

This section in the appendix is dedicated to provide addi¬ 
tional results of the user studies in Section |7.1| We present 
the partial results of distribution of workers’ profile for both 
applications. Additionally, the Stage-2 results of collabora¬ 
tive document writing application is presented here. 








Average Rating 

Task 

Algorithm 

Completeness 

Grammar 

Neutrality 

Clarity 

Timeliness 

Added-value 


Optimal-CDW 

4.6 

4.5 

4.3 

4.3 

4.3 

3.7 

MOOCs 

Aff-Unaware-CDW 

4.1 

4.2 

4.2 

3.9 

3.9 

3.0 


CrtMass-0ptimal-10 

4.0 

4.1 

4.2 

3.9 

3.9 

3.5 


Optimal 

4.8 

4.6 

4.7 

4.1 

4.2 

4.2 

Smartphone 

Aff-Unaware 

4.1 

4.1 

4.2 

4.2 

3.9 

3.3 


CrtMass-0ptimal-10 

4.0 

3.9 

3.8 

4.1 

3.9 

3.3 


Optimal 

4.4 

4.2 

4.3 

4.2 

4.3 

4.3 

Top-10 places 

Aff-Unaware 

3.9 

3.8 

3.7 

3.6 

3.3 

2.9 


CrtMass-0ptimal-10 

3.9 

4.0 

4.1 

4.0 

3.9 

3.9 


Table 5: Stage 3 results of document writing application in Section |T.l[ Quality assessment on the completed tasks of Stage-2 
is performed by a new set of 60 AMT workers on a scale of 1 — 5. For all three tasks, the results clearly demonstrate that effective 
collaboration leads to better task quality. Even though all three groups (assigned to the same task) surpass the skill threhsold and satisfy 
the wage limit, however, our proposed formalism Optimal enables better team collaboration, resulting in higher quality of articles. 



(a) Worker Skill distribution (b) Worker wage distribution (c) Distance Distribution-Region (d) Distance Distribution-Age 

Figure 17: Worker profile distributions for the Sentence Translation Tasks in Section 


7.1 



(d) Strong positive correlation 

(a) Worker Skill distribution (b) Worker wage distribution (c) Worker distance distribution between worker skill and wage 


Figure 18: Worker profile distributions for the Collaborative Document Writing in Section 


7.1 


























































































